Información del conjunto de datos:</p>
Esta investigación se centró en el caso de los pagos por defecto de los clientes de Banco Uno y compara la precisión predictiva de la probabilidad de incumplimiento entre seis métodos de minería de datos.
De la perspectiva de la gestión de riesgos, el resultado de la precisión predictiva de la estimación La probabilidad de incumplimiento será más valiosa que el resultado binario de la clasificación de Clientes creíbles o no creíbles.
Debido a que se desconoce la probabilidad real de incumplimiento, este estudio presentó el novedoso método de clasificación de suavizado para estimar la probabilidad real de defecto. Con la probabilidad real de incumplimiento como variable de respuesta (Y), y la probabilidad predictiva de incumplimiento como la variable independiente (X), la simple lineal El resultado de la regresión (Y = A + BX) muestra que el modelo de pronóstico producido por la red neuronal tiene el coeficiente de determinación más alto; su intersección de regresión (A) es cercano a cero y coeficiente de regresión (B) a uno. Por lo tanto, entre los seis datos de minería técnicas, la red neuronal artificial es la única que puede estimar con precisión el valor real probabilidad de incumplimiento.
Información de los atributos:
NOTA: La siguiente es información actualizada del autor de la fuente.
Esta investigación empleó una variable binaria, pago predeterminado (Sí = 1, No = 0), como variable de respuesta. Este estudio revisó la literatura y utilizó las siguientes 23 variables como variables explicativas:
X1: Monto del crédito otorgado (dólar NT): incluye tanto al consumidor individual crédito y su crédito familiar (complementario).
X2: Género (1 = masculino; 2 = femenino).
X3: Educación (1 = posgrado; 2 = universidad; 3 = bachillerato; 0, 4, 5, 6 = otros).
X4: Estado civil (1 = casado; 2 = soltero; 3 = divorciado; 0 = otros).
X5: Edad (año).
X6 - X11: Historial de pagos pasados. Realizamos un seguimiento de los últimos registros de pagos mensuales (desde Abril a septiembre de 2005) de la siguiente manera: X6 = el estado de reembolso en septiembre de 2005; X7 = el estado de reembolso en agosto de 2005; . . .; X11 = el estado de reembolso en abril de 2005.
La escala de medición para el estado de reembolso es:
2: Sin consumo; -1: pagado en su totalidad; 0: El uso de crédito renovable; 1 = retraso en el pago por un mes; 2 = retraso en el pago de dos meses; . . .; 8 = retraso en el pago de ocho meses; 9 = retraso en el pago de nueve meses o más.
X12-X17: Monto del extracto de la factura (dólar NT). X12 = monto del extracto de la factura en Septiembre de 2005; X13 = monto del estado de cuenta en agosto de 2005; . . .; X17 = cantidad de estado de cuenta en abril de 2005.
X18-X23: Monto del pago anterior (dólar NT). X18 = monto pagado en septiembre, 2005; X19 = monto pagado en agosto de 2005; . . .; X23 = monto pagado en abril de 2005.
Y: comportamiento del cliente; Y = 0 entonces no predeterminado, Y = 1 luego predeterminado "
from sqlalchemy import create_engine
import pymysql
import pandas as pd
connection = pymysql.connect(host='data-analytics-2018.cbrosir2cswx.us-east-1.rds.amazonaws.com',
user='deepAnalytics',
password='Sqltask1234!',
database='Credit',
charset='utf8mb4',
cursorclass=pymysql.cursors.DictCursor)
df = pd.read_sql('SELECT * FROM credit', con=connection)
C:\Users\bryan\anaconda3\lib\site-packages\pandas\io\sql.py:761: UserWarning: pandas only support SQLAlchemy connectable(engine/connection) ordatabase string URI or sqlite3 DBAPI2 connectionother DBAPI2 objects are not tested, please consider using SQLAlchemy warnings.warn(
df. to_csv ( 'BancoUno.csv' , header = False , index = False )
df.head()
| X1 | X2 | X3 | X4 | X5 | X6 | X7 | X8 | X9 | X10 | ... | X15 | X16 | X17 | X18 | X19 | X20 | X21 | X22 | X23 | Y | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | LIMIT_BAL | SEX | EDUCATION | MARRIAGE | AGE | PAY_0 | PAY_2 | PAY_3 | PAY_4 | PAY_5 | ... | BILL_AMT4 | BILL_AMT5 | BILL_AMT6 | PAY_AMT1 | PAY_AMT2 | PAY_AMT3 | PAY_AMT4 | PAY_AMT5 | PAY_AMT6 | default payment next month |
| 1 | 20000 | female | university | 1 | 24 | 2 | 2 | -1 | -1 | -2 | ... | 0 | 0 | 0 | 0 | 689 | 0 | 0 | 0 | 0 | default |
| 2 | 120000 | female | university | 2 | 26 | -1 | 2 | 0 | 0 | 0 | ... | 3272 | 3455 | 3261 | 0 | 1000 | 1000 | 1000 | 0 | 2000 | default |
| 3 | 90000 | female | university | 2 | 34 | 0 | 0 | 0 | 0 | 0 | ... | 14331 | 14948 | 15549 | 1518 | 1500 | 1000 | 1000 | 1000 | 5000 | not default |
| 4 | 50000 | female | university | 1 | 37 | 0 | 0 | 0 | 0 | 0 | ... | 28314 | 28959 | 29547 | 2000 | 2019 | 1200 | 1100 | 1069 | 1000 | not default |
5 rows × 24 columns
credit=df
credit.head()
| X1 | X2 | X3 | X4 | X5 | X6 | X7 | X8 | X9 | X10 | ... | X15 | X16 | X17 | X18 | X19 | X20 | X21 | X22 | X23 | Y | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | LIMIT_BAL | SEX | EDUCATION | MARRIAGE | AGE | PAY_0 | PAY_2 | PAY_3 | PAY_4 | PAY_5 | ... | BILL_AMT4 | BILL_AMT5 | BILL_AMT6 | PAY_AMT1 | PAY_AMT2 | PAY_AMT3 | PAY_AMT4 | PAY_AMT5 | PAY_AMT6 | default payment next month |
| 1 | 20000 | female | university | 1 | 24 | 2 | 2 | -1 | -1 | -2 | ... | 0 | 0 | 0 | 0 | 689 | 0 | 0 | 0 | 0 | default |
| 2 | 120000 | female | university | 2 | 26 | -1 | 2 | 0 | 0 | 0 | ... | 3272 | 3455 | 3261 | 0 | 1000 | 1000 | 1000 | 0 | 2000 | default |
| 3 | 90000 | female | university | 2 | 34 | 0 | 0 | 0 | 0 | 0 | ... | 14331 | 14948 | 15549 | 1518 | 1500 | 1000 | 1000 | 1000 | 5000 | not default |
| 4 | 50000 | female | university | 1 | 37 | 0 | 0 | 0 | 0 | 0 | ... | 28314 | 28959 | 29547 | 2000 | 2019 | 1200 | 1100 | 1069 | 1000 | not default |
5 rows × 24 columns
conda install -c conda-forge pandas-profiling
import pandas_profiling
pandas_profiling.ProfileReport(credit)
credit.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 3670 entries, 0 to 3669 Data columns (total 24 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 X1 3670 non-null object 1 X2 3670 non-null object 2 X3 3670 non-null object 3 X4 3670 non-null object 4 X5 3670 non-null object 5 X6 3670 non-null object 6 X7 3670 non-null object 7 X8 3670 non-null object 8 X9 3670 non-null object 9 X10 3670 non-null object 10 X11 3670 non-null object 11 X12 3670 non-null object 12 X13 3670 non-null object 13 X14 3670 non-null object 14 X15 3670 non-null object 15 X16 3670 non-null object 16 X17 3670 non-null object 17 X18 3670 non-null object 18 X19 3670 non-null object 19 X20 3670 non-null object 20 X21 3670 non-null object 21 X22 3670 non-null object 22 X23 3670 non-null object 23 Y 3670 non-null object dtypes: object(24) memory usage: 688.2+ KB
credit.columns
Index(['X1', 'X2', 'X3', 'X4', 'X5', 'X6', 'X7', 'X8', 'X9', 'X10', 'X11',
'X12', 'X13', 'X14', 'X15', 'X16', 'X17', 'X18', 'X19', 'X20', 'X21',
'X22', 'X23', 'Y'],
dtype='object')
credit.rename({'X1':'LIMIT_BAL'}, axis=1, inplace=True)
credit.rename({'X2':'SEX'}, axis=1, inplace=True)
credit.rename({'X3':'EDUCATION'}, axis=1, inplace=True)
credit.rename({'X4':'MARRIAGE'}, axis=1, inplace=True)
credit.rename({'X5':'AGE'}, axis=1, inplace=True)
credit.rename({'X6':'PAY_0'}, axis=1, inplace=True)
credit.rename({'X7':'PAY_2'}, axis=1, inplace=True)
credit.rename({'X8':'PAY_3'}, axis=1, inplace=True)
credit.rename({'X9':'PAY_4'}, axis=1, inplace=True)
credit.rename({'X10':'PAY_5'}, axis=1, inplace=True)
credit.rename({'X11':'PAY_6'}, axis=1, inplace=True)
credit.rename({'X12':'BILL_AMT1'}, axis=1, inplace=True)
credit.rename({'X13':'BILL_AMT2'}, axis=1, inplace=True)
credit.rename({'X14':'BILL_AMT3'}, axis=1, inplace=True)
credit.rename({'X15':'BILL_AMT4'}, axis=1, inplace=True)
credit.rename({'X16':'BILL_AMT5'}, axis=1, inplace=True)
credit.rename({'X17':'BILL_AMT6'}, axis=1, inplace=True)
credit.rename({'X18':'PAY_AMT1'}, axis=1, inplace=True)
credit.rename({'X19':'PAY_AMT2'}, axis=1, inplace=True)
credit.rename({'X20':'PAY_AMT3'}, axis=1, inplace=True)
credit.rename({'X21':'PAY_AMT4'}, axis=1, inplace=True)
credit.rename({'X22':'PAY_AMT5'}, axis=1, inplace=True)
credit.rename({'X23':'PAY_AMT6'}, axis=1, inplace=True)
credit.rename({'Y':'default payment next month'}, axis=1, inplace=True)
credit.columns
Index(['LIMIT_BAL', 'SEX', 'EDUCATION', 'MARRIAGE', 'AGE', 'PAY_0', 'PAY_2',
'PAY_3', 'PAY_4', 'PAY_5', 'PAY_6', 'BILL_AMT1', 'BILL_AMT2',
'BILL_AMT3', 'BILL_AMT4', 'BILL_AMT5', 'BILL_AMT6', 'PAY_AMT1',
'PAY_AMT2', 'PAY_AMT3', 'PAY_AMT4', 'PAY_AMT5', 'PAY_AMT6',
'default payment next month'],
dtype='object')
credit
| LIMIT_BAL | SEX | EDUCATION | MARRIAGE | AGE | PAY_0 | PAY_2 | PAY_3 | PAY_4 | PAY_5 | ... | BILL_AMT4 | BILL_AMT5 | BILL_AMT6 | PAY_AMT1 | PAY_AMT2 | PAY_AMT3 | PAY_AMT4 | PAY_AMT5 | PAY_AMT6 | default payment next month | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | LIMIT_BAL | SEX | EDUCATION | MARRIAGE | AGE | PAY_0 | PAY_2 | PAY_3 | PAY_4 | PAY_5 | ... | BILL_AMT4 | BILL_AMT5 | BILL_AMT6 | PAY_AMT1 | PAY_AMT2 | PAY_AMT3 | PAY_AMT4 | PAY_AMT5 | PAY_AMT6 | default payment next month |
| 1 | 20000 | female | university | 1 | 24 | 2 | 2 | -1 | -1 | -2 | ... | 0 | 0 | 0 | 0 | 689 | 0 | 0 | 0 | 0 | default |
| 2 | 120000 | female | university | 2 | 26 | -1 | 2 | 0 | 0 | 0 | ... | 3272 | 3455 | 3261 | 0 | 1000 | 1000 | 1000 | 0 | 2000 | default |
| 3 | 90000 | female | university | 2 | 34 | 0 | 0 | 0 | 0 | 0 | ... | 14331 | 14948 | 15549 | 1518 | 1500 | 1000 | 1000 | 1000 | 5000 | not default |
| 4 | 50000 | female | university | 1 | 37 | 0 | 0 | 0 | 0 | 0 | ... | 28314 | 28959 | 29547 | 2000 | 2019 | 1200 | 1100 | 1069 | 1000 | not default |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 3665 | 220000 | female | university | 1 | 32 | 0 | 0 | 0 | 0 | 0 | ... | 208355 | 213015 | 217475 | 7200 | 9000 | 10000 | 8000 | 8010 | 8500 | not default |
| 3666 | 70000 | female | university | 2 | 34 | 1 | 2 | 2 | 2 | 0 | ... | 26456 | 28361 | 31873 | 1500 | 2900 | 0 | 2500 | 4000 | 0 | not default |
| 3667 | 120000 | male | university | 2 | 37 | -1 | 2 | 0 | 0 | 0 | ... | 17901 | 19608 | 19143 | 1000 | 1600 | 800 | 2000 | 0 | 1600 | default |
| 3668 | 180000 | female | university | 2 | 32 | 0 | 0 | 0 | 0 | 0 | ... | 31057 | 29052 | 25933 | 1582 | 30000 | 1000 | 1000 | 1000 | 1000 | not default |
| 3669 | 50000 | female | high school | 1 | 57 | 0 | 0 | 0 | 0 | 0 | ... | 48319 | 48449 | 49656 | 2500 | 2000 | 2000 | 1746 | 2000 | 1800 | not default |
3670 rows × 24 columns
credit.drop_duplicates(inplace=True)
credit
| LIMIT_BAL | SEX | EDUCATION | MARRIAGE | AGE | PAY_0 | PAY_2 | PAY_3 | PAY_4 | PAY_5 | ... | BILL_AMT4 | BILL_AMT5 | BILL_AMT6 | PAY_AMT1 | PAY_AMT2 | PAY_AMT3 | PAY_AMT4 | PAY_AMT5 | PAY_AMT6 | default payment next month | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | LIMIT_BAL | SEX | EDUCATION | MARRIAGE | AGE | PAY_0 | PAY_2 | PAY_3 | PAY_4 | PAY_5 | ... | BILL_AMT4 | BILL_AMT5 | BILL_AMT6 | PAY_AMT1 | PAY_AMT2 | PAY_AMT3 | PAY_AMT4 | PAY_AMT5 | PAY_AMT6 | default payment next month |
| 1 | 20000 | female | university | 1 | 24 | 2 | 2 | -1 | -1 | -2 | ... | 0 | 0 | 0 | 0 | 689 | 0 | 0 | 0 | 0 | default |
| 2 | 120000 | female | university | 2 | 26 | -1 | 2 | 0 | 0 | 0 | ... | 3272 | 3455 | 3261 | 0 | 1000 | 1000 | 1000 | 0 | 2000 | default |
| 3 | 90000 | female | university | 2 | 34 | 0 | 0 | 0 | 0 | 0 | ... | 14331 | 14948 | 15549 | 1518 | 1500 | 1000 | 1000 | 1000 | 5000 | not default |
| 4 | 50000 | female | university | 1 | 37 | 0 | 0 | 0 | 0 | 0 | ... | 28314 | 28959 | 29547 | 2000 | 2019 | 1200 | 1100 | 1069 | 1000 | not default |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 2393 | 50000 | male | high school | 1 | 32 | 2 | 3 | 2 | 2 | 2 | ... | 41526 | 42209 | 44960 | 1500 | 1600 | 1700 | 1500 | 3600 | 0 | default |
| 2394 | 20000 | female | high school | 2 | 49 | 0 | 0 | 2 | 0 | -1 | ... | 10400 | 1150 | 0 | 3008 | 0 | 600 | 1150 | 0 | 0 | not default |
| 2395 | 130000 | female | university | 2 | 24 | 1 | -2 | -1 | -1 | -1 | ... | -3 | 998 | 937 | 0 | 3372 | 0 | 1001 | 939 | 1013 | not default |
| 2396 | 110000 | female | high school | 1 | 27 | 0 | 0 | 0 | 0 | 0 | ... | 47316 | 47175 | 47141 | 2326 | 3000 | 1807 | 1698 | 1911 | 1808 | not default |
| 2397 | 200000 | male | university | 1 | 29 | 0 | 0 | 0 | 2 | 2 | ... | 48367 | 49366 | 47539 | 2078 | 3900 | 1900 | 1904 | 12 | 5224 | not default |
2397 rows × 24 columns
credit.drop([0],axis=0, inplace=True)
credit
| LIMIT_BAL | SEX | EDUCATION | MARRIAGE | AGE | PAY_0 | PAY_2 | PAY_3 | PAY_4 | PAY_5 | ... | BILL_AMT4 | BILL_AMT5 | BILL_AMT6 | PAY_AMT1 | PAY_AMT2 | PAY_AMT3 | PAY_AMT4 | PAY_AMT5 | PAY_AMT6 | default payment next month | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 20000 | female | university | 1 | 24 | 2 | 2 | -1 | -1 | -2 | ... | 0 | 0 | 0 | 0 | 689 | 0 | 0 | 0 | 0 | default |
| 2 | 120000 | female | university | 2 | 26 | -1 | 2 | 0 | 0 | 0 | ... | 3272 | 3455 | 3261 | 0 | 1000 | 1000 | 1000 | 0 | 2000 | default |
| 3 | 90000 | female | university | 2 | 34 | 0 | 0 | 0 | 0 | 0 | ... | 14331 | 14948 | 15549 | 1518 | 1500 | 1000 | 1000 | 1000 | 5000 | not default |
| 4 | 50000 | female | university | 1 | 37 | 0 | 0 | 0 | 0 | 0 | ... | 28314 | 28959 | 29547 | 2000 | 2019 | 1200 | 1100 | 1069 | 1000 | not default |
| 5 | 50000 | male | university | 1 | 57 | -1 | 0 | -1 | 0 | 0 | ... | 20940 | 19146 | 19131 | 2000 | 36681 | 10000 | 9000 | 689 | 679 | not default |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 2393 | 50000 | male | high school | 1 | 32 | 2 | 3 | 2 | 2 | 2 | ... | 41526 | 42209 | 44960 | 1500 | 1600 | 1700 | 1500 | 3600 | 0 | default |
| 2394 | 20000 | female | high school | 2 | 49 | 0 | 0 | 2 | 0 | -1 | ... | 10400 | 1150 | 0 | 3008 | 0 | 600 | 1150 | 0 | 0 | not default |
| 2395 | 130000 | female | university | 2 | 24 | 1 | -2 | -1 | -1 | -1 | ... | -3 | 998 | 937 | 0 | 3372 | 0 | 1001 | 939 | 1013 | not default |
| 2396 | 110000 | female | high school | 1 | 27 | 0 | 0 | 0 | 0 | 0 | ... | 47316 | 47175 | 47141 | 2326 | 3000 | 1807 | 1698 | 1911 | 1808 | not default |
| 2397 | 200000 | male | university | 1 | 29 | 0 | 0 | 0 | 2 | 2 | ... | 48367 | 49366 | 47539 | 2078 | 3900 | 1900 | 1904 | 12 | 5224 | not default |
2396 rows × 24 columns
credit.duplicated().sum()
0
credit.shape
(2396, 24)
credit.dtypes
LIMIT_BAL object SEX object EDUCATION object MARRIAGE object AGE object PAY_0 object PAY_2 object PAY_3 object PAY_4 object PAY_5 object PAY_6 object BILL_AMT1 object BILL_AMT2 object BILL_AMT3 object BILL_AMT4 object BILL_AMT5 object BILL_AMT6 object PAY_AMT1 object PAY_AMT2 object PAY_AMT3 object PAY_AMT4 object PAY_AMT5 object PAY_AMT6 object default payment next month object dtype: object
credit[['LIMIT_BAL', 'MARRIAGE', 'AGE', 'PAY_0', 'PAY_2', 'PAY_3', 'PAY_4', 'PAY_5', 'PAY_6', 'BILL_AMT1', 'BILL_AMT2',
'BILL_AMT3', 'BILL_AMT4', 'BILL_AMT5', 'BILL_AMT6', 'PAY_AMT1', 'PAY_AMT2', 'PAY_AMT3', 'PAY_AMT4', 'PAY_AMT5',
'PAY_AMT6']]=credit[['LIMIT_BAL','MARRIAGE', 'AGE', 'PAY_0', 'PAY_2', 'PAY_3', 'PAY_4', 'PAY_5', 'PAY_6', 'BILL_AMT1',
'BILL_AMT2', 'BILL_AMT3', 'BILL_AMT4', 'BILL_AMT5', 'BILL_AMT6', 'PAY_AMT1', 'PAY_AMT2', 'PAY_AMT3', 'PAY_AMT4',
'PAY_AMT5', 'PAY_AMT6']].astype("int")
credit.dtypes
LIMIT_BAL int32 SEX object EDUCATION object MARRIAGE int32 AGE int32 PAY_0 int32 PAY_2 int32 PAY_3 int32 PAY_4 int32 PAY_5 int32 PAY_6 int32 BILL_AMT1 int32 BILL_AMT2 int32 BILL_AMT3 int32 BILL_AMT4 int32 BILL_AMT5 int32 BILL_AMT6 int32 PAY_AMT1 int32 PAY_AMT2 int32 PAY_AMT3 int32 PAY_AMT4 int32 PAY_AMT5 int32 PAY_AMT6 int32 default payment next month object dtype: object
pandas_profiling.ProfileReport(credit)
Summarize dataset: 0%| | 0/5 [00:00<?, ?it/s]
Generate report structure: 0%| | 0/1 [00:00<?, ?it/s]
Render HTML: 0%| | 0/1 [00:00<?, ?it/s]